Skip to content

Conversation

@sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Feb 18, 2025

Description

This PR addresses the following improvements during MS maintenance

  • Sends 503 (Service Unavailable) response status when maintenance or shutdown is initiated
    [Any load balancer in the clustered environment can avoid routing requests to this MS node]
  • Migrates systemvm agents before routing host agents
  • Updates last agents (using the msid)
  • Added events for maintenance and shutdown operations
  • Stop agent connections monitor during maintenance, (re)start on cancel maintenance
  • Updated setup ms list and migrate agent connections using executor service
  • Fix connected host update, for indirect agents rebalance (to connect to preferred host)
  • Some code improvements

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Manually tested the changes.

(cmk) 🐱 > list managementservers filter=uuid,name,state,pendingjobscount,agentscount,
count = 3
managementserver:
+------+-------------------------------------------------------------+-------------+------------------+-------------+
| UUID |                            NAME                             |    STATE    | PENDINGJOBSCOUNT | AGENTSCOUNT |
+------+-------------------------------------------------------------+-------------+------------------+-------------+
|      | ref-trl-7940-k-m7-suresh-anaparti-mgmt1 | Maintenance |                0 |           0 |
|      | ref-trl-7940-k-m7-suresh-anaparti-mgmt2 | Up          |                0 |           8 |
|      | ref-trl-7940-k-m7-suresh-anaparti-mgmt3 | Up          |                0 |           4 |
+------+-------------------------------------------------------------+-------------+------------------+-------------+

(cmk) 🐱 > list managementserversmetrics filter=name,agentcount,agents,lastagents
count = 3
managementserver:
+-------------------------------------------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                            NAME                             | AGENTCOUNT |                                                                                                                                                          AGENTS                                                                                                                                                           |                                                                          LASTAGENTS                                                                           |
+-------------------------------------------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ref-trl-7940-k-m7-suresh-anaparti-mgmt1 |          0 | []                                                                                                                                                                                                                                                                                                                        | ["4933c5ad-1b83-43ad-8f84-e3419b235657","8b57630e-6202-4c51-914c-6f4ddce52d79","b671e13f-a5eb-4ed2-b99d-080f3b776d0e","7d476ad8-c03d-4fdb-8c0d-f34f0f172d0b"] |
| ref-trl-7940-k-m7-suresh-anaparti-mgmt2 |          8 | ["12462284-f84f-4b24-befa-4b75d0982015","13a5b8ae-2b4b-4977-91d7-f97adb7db564","e8de2144-efb3-408c-99a8-a2592fe2c7d9","794f8261-0a73-4edc-a829-5764deb266e8","bb832b60-f1e7-4601-b50f-8385636ada99","095d3cb6-51f6-4283-9960-e633b8d72cec","d72a4703-1770-445c-8d83-5193eb39cad7","c93e0bd4-9997-44d1-a730-697c7a11512f"] |                                                                                                                                                               |
| ref-trl-7940-k-m7-suresh-anaparti-mgmt3 |          4 | ["4933c5ad-1b83-43ad-8f84-e3419b235657","8b57630e-6202-4c51-914c-6f4ddce52d79","b671e13f-a5eb-4ed2-b99d-080f3b776d0e","7d476ad8-c03d-4fdb-8c0d-f34f0f172d0b"]                                                                                                                                                             |                                                                                                                                                               |
+-------------------------------------------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

503 Service Unavailable response =>

(cmd) 🐱 > create volume diskofferingid=b8199316-68a0-4328-b97a-de50a9f65474 zoneid=02302e8a-be46-42b3-91fd-30c59b4a530d 
🙈 Error: (HTTP 503, error code 9999) Maintenance or Shutdown has been initiated on this management server. Can not accept new jobs

Request:
GET /client/api/?zoneid=02302e8a-be46-42b3-91fd-30c59b4a530d&diskofferingid=b8199316-68a0-4328-b97a-de50a9f65474&name=testvol&command=createVolume&response=json&sessionkey=xGzNFCzSuhI1eh-uG9Ps593r2bY HTTP/1.1

Response:
HTTP/1.1 503 Service Unavailable
Content-Type: application/json;charset=utf-8
X-Description: Maintenance or Shutdown has been initiated on this management server. Can not accept new jobs
...

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Feb 18, 2025

Codecov Report

Attention: Patch coverage is 15.77726% with 363 lines in your changes missing coverage. Please review.

Project coverage is 16.30%. Comparing base (1c1dad9) to head (c7f61ba).
Report is 216 commits behind head on main.

Files with missing lines Patch % Lines
...loudstack/agent/lb/IndirectAgentLBServiceImpl.java 1.37% 143 Missing ⚠️
...java/com/cloud/agent/manager/AgentManagerImpl.java 2.00% 49 Missing ⚠️
agent/src/main/java/com/cloud/agent/Agent.java 0.00% 46 Missing ⚠️
...enance/ManagementServerMaintenanceManagerImpl.java 57.57% 23 Missing and 5 partials ⚠️
...c/main/java/com/cloud/utils/nio/NioConnection.java 46.87% 13 Missing and 4 partials ⚠️
...cloud/agent/manager/ClusteredAgentManagerImpl.java 0.00% 14 Missing ⚠️
...udstack/api/response/ManagementServerResponse.java 0.00% 12 Missing ⚠️
.../src/main/java/com/cloud/host/dao/HostDaoImpl.java 16.66% 9 Missing and 1 partial ⚠️
...src/main/java/com/cloud/server/StatsCollector.java 0.00% 8 Missing ⚠️
...gent/src/main/java/com/cloud/agent/AgentShell.java 33.33% 6 Missing ⚠️
... and 9 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10417      +/-   ##
============================================
+ Coverage     16.17%   16.30%   +0.12%     
- Complexity    13291    13440     +149     
============================================
  Files          5668     5674       +6     
  Lines        498179   499203    +1024     
  Branches      60290    60364      +74     
============================================
+ Hits          80581    81375     +794     
- Misses       408578   408758     +180     
- Partials       9020     9070      +50     
Flag Coverage Δ
uitests 3.99% <ø> (-0.01%) ⬇️
unittests 17.16% <15.77%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12497

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12458)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 51474 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10417-t12458-kvm-ol8.zip
Smoke tests completed. 139 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.29 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.40 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.41 test_ipv4_routing.py
test_06_purge_expunged_vm_background_task Failure 386.49 test_purge_expunged_vms.py

- block new agent connections during prepare for maintenance of ms

- maintain avoids ms list

- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents

- updated setup ms list and migrate agent connections to executor service

- migrate agent connection through executor, and send the answer to the ms host that initiated the migration

- re-initialize ssl handshake executor if it is shutdown

- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states

- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states

- stop agent connections monitor on ms maintenance

- update avoid ms list in ready command

- updated connected host from the client connection

- update last agents in ms metrics from the database

- updated some agent config descriptions

- update last management server in the hosts during shutdown

- added agents and lastagents in management server response

- updated management server maintenance & shutdown unit tests

- some code improvements
@sureshanaparti sureshanaparti force-pushed the ms-maintenance-improvements branch from 0f0f8e7 to 9ef1c12 Compare March 6, 2025 09:39
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12677

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@sureshanaparti
Copy link
Contributor Author

sureshanaparti commented Mar 11, 2025

@sureshanaparti, there is some unit test failure on the GHA build. Can you please check if it is an issue?

seems unrelated to ms maintenance unit tests, will check (restarted it with debugging).

@sureshanaparti
Copy link
Contributor Author

@sureshanaparti, there is some unit test failure on the GHA build. Can you please check if it is an issue?

seems unrelated to ms maintenance unit tests, will check (restarted it with debugging).

@shwstppr fixed test, shutdown test is calling system.exit.

@apache apache deleted a comment from blueorangutan Mar 12, 2025
@apache apache deleted a comment from blueorangutan Mar 12, 2025
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12748

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12662)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 58072 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10417-t12662-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1522.79 test_network.py

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12805

@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@sureshanaparti sureshanaparti marked this pull request as ready for review March 17, 2025 10:20
Copy link
Contributor

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Tested manually. Please find the repro steps

Issue 1

1. Cloudstack should throw error 503 when the management is in maintenance mode and async api calls are called

Before fix


 1. When maintenance mode is enabled on the management server
2. Execute the api call , the error 530 is thrown


	(localcloud) 🐱 > create volume diskofferingid=51a838c6-1428-4e6d-bbbb-4f774e062719 zoneid=1163fe4e-06f9-4763-bad8-47db23f8c875	
	🙈 Error: (HTTP 530, error code 4250) Maintenance or Shutdown has been initiated on this management server. Can not accept new jobs




After fix 













	(localcloud) 🐱 > create volume diskofferingid=2e64ecf3-58a5-4d0c-80cd-43ec93515b35 zoneid=3a42f982-fa53-4bbc-8843-6c3b4b4aaa6c
	🙈 Error: (HTTP 503, error code 9999) Maintenance or Shutdown has been initiated on this management server. Can not accept new jobs

Issue 2

2. Cloudstack should migrate the agents associated with a management server when manitainence mode is enabled on the management server. The list management server API cal and list managementservermetrics should list the agents and lastagents in the output





Steps to verify the feature

  1. Have a cloudstack environment with multiple management server
  2. Execute the following api calls

 localcloud) 🐱 > list managementservers filter=name,agents,lastagents
count = 3
managementserver:
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+
|                            NAME                             |                                                                            AGENTS                                                                             |                                   LASTAGENTS                                    |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+
| ref-trl-8103-k-mol8-kiran-chavala-mgmt1.sofia.shapeblue.com | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] |
| ref-trl-8103-k-mol8-kiran-chavala-mgmt2.sofia.shapeblue.com | []                                                                                                                                                            | ["90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4"] |
| mgmt3.sofia.shapeblue.com                                   | []                                                                                                                                                            | []                                                                              |




(localcloud) 🐱 > list managementserversmetrics  filter=name,agents,lastagents
managementserver:
count = 3
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+
|                            NAME                             |                                                                            AGENTS                                                                             |                                   LASTAGENTS                                    |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+
| ref-trl-8103-k-mol8-kiran-chavala-mgmt1.sofia.shapeblue.com | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] |
| ref-trl-8103-k-mol8-kiran-chavala-mgmt2.sofia.shapeblue.com | []                                                                                                                                                            | ["90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4"] |
| mgmt3.sofia.shapeblue.com                                   | []                                                                                                                                                            | []                                                                              |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+
  1. Execute the api prepare for maintainence api call



       prepare formaintenance managementserverid=
    
  2. Execute the list managementservers and list managementserversmetrics api call again , you will observe the agents gets migrated to other management servers,

    Check the agents and lastagents outputs



    The api call list managementserversmetrics gets updated after the value mentioned in the global setting management.server.stats.interval

(localcloud) 🐱 > list managementservers filter=name,agents,lastagents
count = 3
managementserver:
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                            NAME                             |                                                                            AGENTS                                                                             |                                                                          LASTAGENTS                                                                           |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ref-trl-8103-k-mol8-kiran-chavala-mgmt1.sofia.shapeblue.com | []                                                                                                                                                            | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] |
| ref-trl-8103-k-mol8-kiran-chavala-mgmt2.sofia.shapeblue.com | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] | []                                                                                                                                                            |
| mgmt3.sofia.shapeblue.com                                   | []                                                                                                                                                            | []                                                                                                                                                            |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+



(localcloud) 🐱 > list managementserversmetrics  filter=name,agents,lastagents
managementserver:
count = 3
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                            NAME                             |                                                                            AGENTS                                                                             |                                                                          LASTAGENTS                                                                           |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ref-trl-8103-k-mol8-kiran-chavala-mgmt1.sofia.shapeblue.com | []                                                                                                                                                            | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] |
| ref-trl-8103-k-mol8-kiran-chavala-mgmt2.sofia.shapeblue.com | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] | []                                                                                                                                                            |
| mgmt3.sofia.shapeblue.com                                   | []                                                                                                                                                            | []                                                                                                                                                            |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
(localcloud) 🐱 >

Issue 3

3. Cloudstack should redistribute the agents associated with a management server based on the global setting 

indirect.agent.lb.algorithm and indirect.agent.lb.check.interval



Steps to verify the feature

  1. Have a cloudstack environment with multiple management server and agents connected to one management server
  

 localcloud) 🐱 > list managementservers filter=name,agents,lastagents
count = 3
managementserver:
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+
|                            NAME                             |                                                                            AGENTS                                                                             |                                   LASTAGENTS                                    |
+-------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+
| ref-trl-8103-k-mol8-kiran-chavala-mgmt1.sofia.shapeblue.com | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] |
| ref-trl-8103-k-mol8-kiran-chavala-mgmt2.sofia.shapeblue.com | []                                                                                                                                                            | ["90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4"] |
| mgmt3.sofia.shapeblue.com                                   | []                                                                                                                                                            | []                                                                              |

  1. Change the value of global setting “indirect.agent.lb.algorithm” from static to roundrobin/shuffle



  2. Execute the api prepare for maintainence api call

 prepare formaintenance managementserverid=


  3. The agent are distributed among the management servers after the value indirect.agent.lb.check.interval

l

 localcloud) 🐱 > list managementservers filter=name,agents,lastagents
count = 3
managementserver:
+-------------------------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                            NAME                             |                                     AGENTS                                      |                                                                          LASTAGENTS                                                                           |
+-------------------------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ref-trl-8103-k-mol8-kiran-chavala-mgmt1.sofia.shapeblue.com | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] | ["b2418798-c3d4-453b-b7a7-d3e20cdb5f3d","90bd9ae9-b084-47ef-9415-650d4c1d2fa7","a87cf0ff-f513-4936-ba80-60759bec4bb4","24b86b9f-62a5-427c-92f4-fa8fd42e5135"] |
| ref-trl-8103-k-mol8-kiran-chavala-mgmt2.sofia.shapeblue.com | ["90bd9ae9-b084-47ef-9415-650d4c1d2fa7"]                                        | []                                                                                                                                                            |
| mgmt3.sofia.shapeblue.com                                   | ["a87cf0ff-f513-4936-ba80-60759bec4bb4"]                                        | []                                                                                                                                                            |
+-------------------------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+


Issue 4

Events added for managaement server maintainence mode

Screenshot 2025-03-17 at 4 14 25 PM

@blueorangutan
Copy link

[SF] Trillian test result (tid-12717)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 59759 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10417-t12717-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1519.61 test_network.py

@sureshanaparti
Copy link
Contributor Author

Merging this based on the review & tests.

@sureshanaparti sureshanaparti merged commit 9dceae4 into apache:main Mar 19, 2025
25 of 26 checks passed
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jun 19, 2025
* Update last agents during ms maintenance, and some code improvements

* Send 503 (Service Unavailable) response status when maintenance or shutdown is initiated
[Any load balancer in the clustered environment can avoid routing requests to this MS node]

* Migrate systemvm agents before routing host agents, and some code improvements

* Added events for ms maintenance and shutdown operations

* Added the following ms maintenance and shutdown improvements

- block new agent connections during prepare for maintenance of ms

- maintain avoids ms list

- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents

- updated setup ms list and migrate agent connections to executor service

- migrate agent connection through executor, and send the answer to the ms host that initiated the migration

- re-initialize ssl handshake executor if it is shutdown

- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states

- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states

- stop agent connections monitor on ms maintenance

- update avoid ms list in ready command

- updated connected host from the client connection

- update last agents in ms metrics from the database

- updated some agent config descriptions

- update last management server in the hosts during shutdown

- added agents and lastagents in management server response

- updated management server maintenance & shutdown unit tests

- some code improvements

* refactored code / addressed comments

* removed shutdown testcase (maybe, calling System.exit)

* Revert "removed shutdown testcase (maybe, calling System.exit)"

This reverts commit e14b071.

* avoid system.exit during shutdown test

* code improvements

* testcase fix

* Fix cutoff time in agent connections monitor thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants